Multi-prototype Chinese Character Embedding
نویسندگان
چکیده
Chinese sentences are written as sequences of characters, which are elementary units of syntax and semantics. Characters are highly polysemous in forming words. We present a position-sensitive skip-gram model to learn multi-prototype Chinese character embeddings, and explore the usefulness of such character embeddings to Chinese NLP tasks. Evaluation on character similarity shows that multi-prototype embeddings are significantly better than a single-prototype baseline. In addition, used as features in the Chinese NER task, the embeddings result in a 1.74% F-score improvement over a state-of-the-art baseline.
منابع مشابه
A Prototype of Multi-Font Printed Chinese Character Reader
An approach to multi-font printed Chinese character recognition is proposed in this paper. The problems of inputting image of characters, preprocessing, character segmentati~n~feature extraction as well as character classification have been discussed. According to the characteristics of multi-font printed Chinese characters,the number of cutting across strokes, the external and internal areas w...
متن کاملMulti-Granularity Chinese Word Embedding
This paper considers the problem of learning Chinese word embeddings. In contrast to English, a Chinese word is usually composed of characters, and most of the characters themselves can be further divided into components such as radicals. While characters and radicals contain rich information and are capable of indicating semantic meanings of words, they have not been fully exploited by existin...
متن کاملRadical-Enhanced Chinese Character Embedding
We present a method to leverage radical for learning Chinese character embedding. Radical is a semantic and phonetic component of Chinese character. It plays an important role as characters with the same radical usually have similar semantic meaning and grammatical usage. However, existing Chinese processing algorithms typically regard word or character as the basic unit but ignore the crucial ...
متن کاملA Study on BLSTM-RNN-based Chinese Prosodic Structure Prediction in a Unified Framework with Character-level Features
In Text-to-Speech system, prosodic attributes have to be predicted only from input text. The accuracy of prosody prediction has a significant effect on the naturalness of synthesized speech of Chinese. In this paper, we explore using neural networks to predict prosodic boundaries from Chinese text without task specific knowledge or sophisticated feature engineering. We examine sequence characte...
متن کاملNeural Domain Adaptation with Contextualized Character Embedding for Chinese Word Segmentation
There has a large scale annotated newswire data for Chinese word segmentation. However, some research proves that the performance of the segmenter has significant decrease when applying the model trained on the newswire to other domain, such as patent and literature. The same character appeared in different words may be in different position and with different meaning. In this paper, we introdu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016